baseline value
Analyzing Fairness of Classification Machine Learning Model with Structured Dataset
Rashed, Ahmed, Kallich, Abdelkrim, Eltayeb, Mohamed
Machine learning (ML) algorithms have become integral to decision making in various domains, including healthcare, finance, education, and law enforcement. However, concerns about fairness and bias in these systems pose significant ethical and social challenges. This study investigates the fairness of ML models applied to structured datasets in classification tasks, highlighting the potential for biased predictions to perpetuate systemic inequalities. A publicly available dataset from Kaggle was selected for analysis, offering a realistic scenario for evaluating fairness in machine learning workflows. To assess and mitigate biases, three prominent fairness libraries; Fairlearn by Microsoft, AIF360 by IBM, and the What If Tool by Google were employed. These libraries provide robust frameworks for analyzing fairness, offering tools to evaluate metrics, visualize results, and implement bias mitigation strategies. The research aims to assess the extent of bias in the ML models, compare the effectiveness of these libraries, and derive actionable insights for practitioners. The findings reveal that each library has unique strengths and limitations in fairness evaluation and mitigation. By systematically comparing their capabilities, this study contributes to the growing field of ML fairness by providing practical guidance for integrating fairness tools into real world applications. These insights are intended to support the development of more equitable machine learning systems.
Reinforcement learning to learn quantum states for Heisenberg scaling accuracy
Jae, Jeongwoo, Hong, Jeonghoon, Choo, Jinho, Kwon, Yeong-Dae
Learning quantum states is a crucial task for realizing the potential of quantum information technology. Recently, neural approaches have emerged as promising methods for learning quantum states. We propose a meta-learning model that employs reinforcement learning (RL) to optimize the process of learning quantum states. For learning quantum states, our scheme trains a Hardware efficient ansatz with a blackbox optimization algorithm, called evolution strategy (ES). To enhance the efficiency of ES, a RL agent dynamically adjusts the hyperparameters of ES. To facilitate the RL training, we introduce an action repetition strategy inspired by curriculum learning. The RL agent significantly improves the sample efficiency of learning random quantum states, and achieves infidelity scaling close to the Heisenberg limit. We showcase that the RL agent trained using 3-qubit states can be generalized to learning up to 5-qubit states. These results highlight the utility of RL-driven meta-learning to enhance the efficiency and generalizability of learning quantum states. Our approach can be applicable to improve quantum control, quantum optimization, and quantum machine learning.
Can I Pet Your Robot? Incorporating Capacitive Touch Sensing into a Soft Socially Assistive Robot Platform
O'Connell, Amy, Cislowski, Bailey, Culbertson, Heather, Matariฤ, Maja
Abstract-- This work presents a method of incorporating lowcost capacitive tactile sensors on a soft socially assistive robot platform. By embedding conductive thread into the robot's crocheted exterior, we formed a set of low-cost, flexible capacitive tactile sensors that do not disrupt the robot's soft, zoomorphic embodiment. We evaluated the sensors' performance through a user study (N=20) and found that the sensors reliably detected user touch events and localized touch inputs to one of three regions on the robot's exterior. Touch plays an important role in human-human and human-animal social interactions, but it is not often utilized in social interactions between humans and robots. Most Flexibility and Cohesiveness Sensors should be easy to reconfigure commercially available platforms for robotics research do for different interaction contexts without disrupting not incorporate tactile sensing, and those that do cannot Blossom's soft, handcrafted embodiment. In this work, we propose and A. Physical Device Design evaluate a method of incorporating low-cost capacitive tactile We embedded conductive thread into the crocheted exterior sensors onto the Blossom open-source soft robot platform [1] to enable Blossom to detect touch from the user.
Root Causing Prediction Anomalies Using Explainable AI
Vishnampet, Ramanathan, Shenoy, Rajesh, Chen, Jianhui, Gupta, Anuj
This paper presents a novel application of explainable AI (XAI) for root-causing performance degradation in machine learning models that learn continuously from user engagement data. In such systems a single feature corruption can cause cascading feature, label and concept drifts. We have successfully applied this technique to improve the reliability of models used in personalized advertising. Performance degradation in such systems manifest as prediction anomalies in the models. These models are typically trained continuously using features that are produced by hundreds of real time data processing pipelines or derived from other upstream models. A failure in any of these pipelines or an instability in any of the upstream models can cause feature corruption, causing the model's predicted output to deviate from the actual output and the training data to become corrupted. The causal relationship between the features and the predicted output is complex, and root-causing is challenging due to the scale and dynamism of the system. We demonstrate how temporal shifts in the global feature importance distribution can effectively isolate the cause of a prediction anomaly, with better recall than model-to-feature correlation methods. The technique appears to be effective even when approximating the local feature importance using a simple perturbation-based method, and aggregating over a few thousand examples. We have found this technique to be a model-agnostic, cheap and effective way to monitor complex data pipelines in production and have deployed a system for continuously analyzing the global feature importance distribution of continuously trained models.
Fluctuation-based Adaptive Structured Pruning for Large Language Models
An, Yongqi, Zhao, Xu, Yu, Tao, Tang, Ming, Wang, Jinqiao
Network Pruning is a promising way to address the huge computing resource demands of the deployment and inference of Large Language Models (LLMs). Retraining-free is important for LLMs' pruning methods. However, almost all of the existing retraining-free pruning approaches for LLMs focus on unstructured pruning, which requires specific hardware support for acceleration. In this paper, we propose a novel retraining-free structured pruning framework for LLMs, named FLAP (FLuctuation-based Adaptive Structured Pruning). It is hardware-friendly by effectively reducing storage and enhancing inference speed. For effective structured pruning of LLMs, we highlight three critical elements that demand the utmost attention: formulating structured importance metrics, adaptively searching the global compressed model, and implementing compensation mechanisms to mitigate performance loss. First, FLAP determines whether the output feature map is easily recoverable when a column of weight is removed, based on the fluctuation pruning metric. Then it standardizes the importance scores to adaptively determine the global compressed model structure. At last, FLAP adds additional bias terms to recover the output feature maps using the baseline values. We thoroughly evaluate our approach on a variety of language benchmarks. Without any retraining, our method significantly outperforms the state-of-the-art methods, including LLM-Pruner and the extension of Wanda in structured pruning. The code is released at https://github.com/CASIA-IVA-Lab/FLAP.
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Li, Ziniu, Xu, Tian, Zhang, Yushun, Lin, Zhihang, Yu, Yang, Sun, Ruoyu, Luo, Zhi-Quan
Alignment is crucial for training large language models. The predominant strategy is Reinforcement Learning from Human Feedback (RLHF), with Proximal Policy Optimization (PPO) as the de-facto algorithm. Yet, PPO is known to struggle with computational inefficiency, a challenge that this paper aims to address. We identify three important properties of RLHF tasks: fast simulation, deterministic transitions, and trajectory-level rewards, which are not leveraged in PPO. Based on these properties, we develop ReMax, a new algorithm tailored for RLHF. The design of ReMax builds on the celebrated algorithm REINFORCE but is enhanced with a new variance-reduction technique. ReMax offers threefold advantages over PPO: first, it is simple to implement with just 6 lines of code. It further eliminates more than 4 hyper-parameters in PPO, which are laborious to tune. Second, ReMax reduces memory usage by about 50%. To illustrate, PPO runs out of memory when fine-tuning a Llama2-7B model on A100-80GB GPUs, whereas ReMax can support the training. Even though memory-efficient techniques (e.g., ZeRO and offload) are employed for PPO to afford training, ReMax can utilize a larger batch size to increase throughput. Third, in terms of wall-clock time, PPO is about twice as slow as ReMax per iteration. Importantly, these improvements do not sacrifice task performance. We hypothesize that these advantages can be maintained in larger-scale models.
Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization
Farrahi, Homayoon, Mahmood, A. Rupam
Continuous-time reinforcement learning tasks commonly use discrete steps of fixed cycle times for actions. As practitioners need to choose the action-cycle time for a given task, a significant concern is whether the hyper-parameters of the learning algorithm need to be re-tuned for each choice of the cycle time, which is prohibitive for real-world robotics. In this work, we investigate the widely-used baseline hyper-parameter values of two policy gradient algorithms -- PPO and SAC -- across different cycle times. Using a benchmark task where the baseline hyper-parameters of both algorithms were shown to work well, we reveal that when a cycle time different than the task default is chosen, PPO with baseline hyper-parameters fails to learn. Moreover, both PPO and SAC with their baseline hyper-parameters perform substantially worse than their tuned values for each cycle time. We propose novel approaches for setting these hyper-parameters based on the cycle time. In our experiments on simulated and real-world robotic tasks, the proposed approaches performed at least as well as the baseline hyper-parameters, with significantly better performance for most choices of the cycle time, and did not result in learning failure for any cycle time. Hyper-parameter tuning still remains a significant barrier for real-world robotics, as our approaches require some initial tuning on a new task, even though it is negligible compared to an extensive tuning for each cycle time. Our approach requires no additional tuning after the cycle time is changed for a given task and is a step toward avoiding extensive and costly hyper-parameter tuning for real-world policy optimization.
Variance Reduction for Score Functions Using Optimal Baselines
Many problems involve the use of models which learn probability distributions or incorporate randomness in some way. In such problems, because computing the true expected gradient may be intractable, a gradient estimator is used to update the model parameters. When the model parameters directly affect a probability distribution, the gradient estimator will involve score function terms. This paper studies baselines, a variance reduction technique for score functions. Motivated primarily by reinforcement learning, we derive for the first time an expression for the optimal state-dependent baseline, the baseline which results in a gradient estimator with minimum variance. Although we show that there exist examples where the optimal baseline may be arbitrarily better than a value function baseline, we find that the value function baseline usually performs similarly to an optimal baseline in terms of variance reduction. Moreover, the value function can also be used for bootstrapping estimators of the return, leading to additional variance reduction. Our results give new insight and justification for why value function baselines and the generalized advantage estimator (GAE) work well in practice.
Federated Hypergradient Descent
In this work, we explore combining automatic hyperparameter tuning and optimization for federated learning (FL) in an online, one-shot procedure. We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size. In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resources in the training pipeline. Conventionally, hyperparameter tuning methods involve at least some degree of trial-and-error, which is known to be sample inefficient. In order to address our motivations, we propose FATHOM (Federated AuTomatic Hyperparameter OptiMization) as a one-shot online procedure. We investigate the challenges and solutions of deriving analytical gradients with respect to the hyperparameters of interest. Our approach is inspired by the fact that, with the exception of local data, we have full knowledge of all components involved in our training process, and this fact can be exploited in our algorithm impactfully. We show that FATHOM is more communication efficient than Federated Averaging (FedAvg) with optimized, static valued hyperparameters, and is also more computationally efficient overall. As a communication efficient, one-shot online procedure, FATHOM solves the bottleneck of costly communication and limited local computation, by eliminating a potentially wasteful tuning process, and by optimizing the hyperparamters adaptively throughout the training procedure without trial-and-error. We show our numerical results through extensive empirical experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow (FSO) datasets, using FedJAX as our baseline framework.
Towards Axiomatic, Hierarchical, and Symbolic Explanation for Deep Models
Ren, Jie, Li, Mingjie, Chen, Qirui, Deng, Huiqi, Zhang, Quanshi
This paper proposes a hierarchical and symbolic And-Or graph (AOG) to objectively explain the internal logic encoded by a well-trained deep model for inference. We first define the objectiveness of an explainer model in game theory, and we develop a rigorous representation of the And-Or logic encoded by the deep model. The objectiveness and trustworthiness of the AOG explainer are both theoretically guaranteed and experimentally verified. Furthermore, we propose several techniques to boost the conciseness of the explanation.